µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
±×·¡ÇÁ ±â¹Ý ºÐ»êó¸® ½Ã½ºÅÛ Æ®¸®´ÏƼ¸¦ ÀÌ¿ëÇÑ ¼¿ Á¤·Ä ¾Ë°í¸®Áò |
¿µ¹®Á¦¸ñ(English Title) |
SAG: Sequence Alignment Algorithm based on Graph with distributed system Trinity |
ÀúÀÚ(Author) |
ÀÌÁؼö
¿©À±±¸
³ëÈ«Âù
À±¿µ¹Ì
¹Ú»óÇö
Jun-Su Lee
Yun-Ku Yeu
Hong-Chan Roh
Young-Mi Yoon
Sang-Hyun Park
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 30 NO. 01 PP. 0017 ~ 0028 (2014. 04) |
Çѱ۳»¿ë (Korean Abstract) |
À¯ÀüüÇÐ(Genomics)¿¡¼ ¼¿Á¤·ÄÀº °¡Àå ³Î¸® »ç¿ëµÈ´Ù. Â÷¼¼´ë ½ÃÄö½Ì(Next Generation Sequencing) ±â¼úÀÌ ¹ßÀüÇϸé¼, ÃÖ±Ù ¼¿ ¸®µå µ¥ÀÌÅÍÀÇ ¾çÀÌ ±Þ°ÝÇÏ°Ô Áõ°¡Çß´Ù. ±ÞÁõÇÑ Â÷¼¼´ë ½ÃÄö½Ì µ¥ÀÌÅ͸¦ ó¸®Çϱâ À§ÇÑ ¼¿Á¤·Ä ¾Ë°í¸®ÁòÀÌ ¸¹ÀÌ °³¹ßµÇ¾ú´Ù. ÇÏÁö¸¸ ¼¿Á¤·Ä ¾Ë°í¸®ÁòµéÀº ¹Ýº¹¼¿(repeat), º¯ÀÌ(polymorphism)¸¦ ó¸®Çϱâ À§ÇØ ¸¹Àº °è»ê·®À» ¿ä±¸ÇÑ´Ù. ±×·¸±â ¶§¹®¿¡ ±âÁ¸ ¼¿Á¤·Ä ¾Ë°í¸®ÁòÀº 󸮷®(throughput)°ú Á¤·ÄÇ°Áú(quality)»çÀÌ¿¡ Æ®·¹À̵å¿ÀÇÁ(trade-off)°¡ Á¸ÀçÇÑ´Ù. ÇÏÁö¸¸ ºÐ»êó¸® ½Ã½ºÅÛ Hadoop, Trinity¿¡¼ µ¿ÀÛÇÏ´Â Á¤·Ä ¾Ë°í¸®ÁòÀº ±âÁ¸ ½Ì±Û¿¡¼ µ¿ÀÛÇÏ´Â ¾Ë°í¸®Áò¿¡ ºñÇØ Á¤·Ä Ç°ÁúÀ» ´ú Èñ»ýÇÏ°í, ´õ ³ôÀº 󸮷®À» ¾òÀ» ¼ö ÀÖ´Ù. º» ³í¹®¿¡¼´Â Microsoft¿¡¼ Á¦¾ÈÇÑ ±×·¡ÇÁ ±â¹Ý ÀÎ-¸Þ¸ð¸®(in-memory) ºÐ»ê½Ã½ºÅÛ Æ®¸®´ÏƼ(Trinity)¿¡¼ µ¿ÀÛÇÏ´Â ¼¿Á¤·Ä ¾Ë°í¸®Áò SAG(Sequence Alignment Algorithm based on Graph with Trinity)¸¦ Á¦¾ÈÇÑ´Ù. ¿ì¸®´Â ±âÁ¸ ÂüÁ¶ ¼¿À» ±×·¡ÇÁ ÇüÅÂÀÇ µ¥ÀÌÅÍ·Î º¯Çü ÇÑ µÚ, ±×·¡ÇÁ¿¡¼ ¿¬°á °¡´ÉÇÑ ÀÎÁ¢ÇÑ ³ëµå¿¡ »õ·Î¿î °£¼±À» Ãß°¡Çß´Ù. ±×¸®°í º¯ÀÌ(polymorphism)¸¦ Çã¿ëÇÏ´Â Á¤·ÄÀ» ¼öÇàÇϱâ À§ÇØ ¼¿Á¶°¢µé »çÀÌÀÇ Á¶ÇÕÀ» ÅëÇØ È常¦ ¾ò¾ú´Ù. ¸¶Áö¸·À¸·Î È常¦ ´ë»óÀ¸·Î glocal alignment¸¦ ¼öÇàÇØ ÃÖÁ¾ÀûÀÎ °á°ú¸¦ ã¾Ò´Ù. ½ÇÇèÀ» ÅëÇØ SAG´Â ±âÁ¸ Hadoop¿¡¼ µ¿ÀÛÇÏ´Â ¾Ë°í¸®Áò°ú ºñ±³ÇßÀ» ¶§ ºñ½ÁÇϰųª ´õ ÁÁÀº Á¤·Ä Ç°ÁúÁ¶°Ç°ú µ¿½Ã¿¡ »ó´çÈ÷ ³ôÀº 󸮷®À» ¾ò¾ú´Ù. ¶ÇÇÑ ¸Ó½ÅÀ» Ãß°¡ÇÔÀ¸·Î½á ´õ ÁÁÀº 󸮷®À» ¾ò´Â È®À强À» ÀÔÁõÇÏ¿´´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
Sequence alignment is one of the widely used tools in genomics. Recently, after NGS(Next Generation Sequencing) technology was developed, the production of sequence read data increased dramatically. A number of sequence alignment algorithms have been developed for processing these NGS data. However, these algorithms are suffered from a trade-off between throughput and alignment quality, because there is a large computation cost for handling the repeat reads and polymorphism. On the contrary, alignment algorithms with distributed system such as Hadoop and Trinity can obtain better throughput without compromising alignment quality than existing algorithms on single machine. In this paper, we suggest SAG, sequence alignment algorithm based on graph with in-memory distributed system, Trinity proposed by Microsoft. We transformed reference sequence into a graph form, and added new edge between adjacent node having connection possibility on graph. And we performed combination of sequence fragments in order to candidates allowing polymorphism. Finally, we performed glocal alignment to find final results for the obtained candidates. Our experimental results show that SAG better throughput with same quality or better quality than existing algorithms with Hadoop. We have also proved scalability that we obtained better throughput by simply adding machines.
|
Å°¿öµå(Keyword) |
¼¿Á¤·Ä ¾Ë°í¸®Áò
±×·¡ÇÁ
ºÐ»ê󸮽ýºÅÛ
Â÷¼¼´ë ½ÃÄö½Ì
sequence alignment algorithm
graph
distributed system
NGS (next generation sequencing)
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|